Quantitative CT imaging features for COVID-19 evaluation: The ability to differentiate COVID-19 from non- COVID-19 (highly suspected) pneumonia patients during the epidemic period

Objectives COVID-19 and Non-Covid-19 (NC) Pneumonia encountered high CT imaging overlaps during pandemic. The study aims to evaluate the effectiveness of image-based quantitative CT features in discriminating COVID-19 from NC Pneumonia. Materials and methods 145 patients with highly suspected COVID-19 were retrospectively enrolled from four centers in Sichuan Province during January 23 to March 23, 2020. 88 cases were confirmed as COVID-19, and 57 patients were NC. The dataset was randomly divided by 3:2 into training and testing sets. The quantitative CT radiomics features were extracted and screened sequentially by correlation analysis, Mann-Whitney U test, the least absolute shrinkage and selection operator (LASSO) logistic regression (LR) and backward stepwise LR with minimum AIC methods. The selected features were used to construct the LR model for differentiating COVID-19 from NC. Meanwhile, the differentiation performance of traditional quantitative CT features such as lesion volume ratio, ground glass opacity (GGO) or consolidation volume ratio were also considered and compared with Radiomics-based method. The receiver operating characteristic curve (ROC) analysis were conducted to evaluate the predicting performance. Results Compared with traditional CT quantitative features, radiomics features performed best with the highest Area Under Curve (AUC), sensitivity, specificity and accuracy in the training (0.994, 0.942, 1.0 and 0.965) and testing sets (0.977, 0.944, 0.870, 0.915) (Delong test, P < 0.001). Among CT volume-ratio based models using lesion or GGO component ratio, the model combining CT lesion score and component ratio performed better than others, with the AUC, sensitivity, specificity and accuracy of 0.84, 0.692, 0.853, 0.756 in the training set and 0.779, 0.667, 0.826, 0.729 in the testing set. The significant difference of the most selected wavelet transformed radiomics features between COVID-19 and NC might well reflect the CT signs. Conclusions The differentiation between COVID-19 and NC could be well improved by using radiomics features, compared with traditional CT quantitative values.

c. The variables with abnormal distribution were depicted by mean ± SD.   b. The averaged model performance in the sub-training set among multiple-split dataset.
c. The averaged model performance in the sub-test set among multiple-split dataset. d. The model's averaged optimism as the difference between the sub-training and sub-test set.
e. The corrected AUC by subtracting the average optimism from the apparent AUC.  c: The adjusted residual. The absolute value of adjusted residual larger than 2.58 was considered to be significant.

Image analysis based on Lung Kit software
In the current study, the radiomics features and several quantitative derived CT features were extracted from Lung Kit software (Version 2.2, GE Healthcare). The software is one commercialized software only for scientific research and the analysis module for COVID-19 involves the four-step data processing flow which is described as follows.
(1) Image loading and user-defined image preprocessing: In the current research, all the images were anonymized with patient information and loaded into LK2.2 software as nifty format. All the images were firstly resampled into isotropic voxel size of 1 mm*1 mm*1 mm using trilinear interpolation. After interpolation into isotropic voxel, the image's intensity values were rounded to the nearest integer HU value. The low-pass Gaussian filter with σ = 0.5 was then conducted to increase the reproducibility of the radiomics features.
(2) The automatic segmentation of lung lobes and pneumonia lesion:  (upper, middle, bottom). Based on the lung lobes segmentation, the pneumonia lesions were detected and volume of interest (VOI) was segmented as a whole. The machine-human collaboration was applied to guarantee correct segmentation. The margin of the VOI was checked and manually adjusted by an experienced thoracic radiologist (SK. P, a radiology attending doctor with 7 years' experience in interpreting chest CT images) and the obviously swollen blood vessels involved in the lesion were excluded, if necessary. All the automatically segmented or manually adjusted VOIs were checked by a senior radiologist (H. P [a thoracic radiologist with 28 years' experience]) to reach consensus. The distributed lesions were considered as a whole VOI in the following analysis steps.
(3) Quantization of pneumonia lesion: the volume of each segmented lung lobe and pneumonia lesion were firstly calculated. The lesion volume ratio and lesion component analysis were conducted as follows.
For lesion volume ratio analysis, the lesion volume ratio in five lung lobes was respectively calculated automatically by LK2.2 after lesion VOI was delineated. The lesion ratio in each lung lobe was scored from 0 to 5 which was defined according to the volume ratio involved: 0, no lesion; The extracted quantitative CT features and radiomics features were then analyzed and modeled based on the method described in the main text.

The method for the 100-times bootstrapping and LGOCV
We conducted 100-times bootstrapping and 100-fold leave-group-out cross-validation (LGOCV) with the proportion of data in the sub-training sets (60%), to measure the reliability of the radiomics features involved in the final model and the model overoptimism. The reliability of the radiomics features was measured by the appearing frequency during multiple splits of training and test sets.
While the model overoptimism was verified by the mean AUC in the multiple-split training and test sets.

100-times bootstrapping method
Step 1: Apparent performance. Taking the model's AUC performance of predicting model developed in the original whole training dataset as the apparent performance.
Step2: BOOTSTRAP sample splitting. The whole original training dataset was repeatedly split into bootstrapped training and test sets. The bootstrapped training set with the same sample size as original training dataset was constructed by sampling with replacement from the original sample.
The out-of-bag samples constructed the test set in each bootstrap.
Step3: BOOTSTRAP model establishment. Starting features were features used to establish radiomics model based on the original training dataset. And these features were further selected by backward stepwise logistic regression method with minimum AIC (the same as the modeling method in the manuscript) in each bootstrapped training set. The logistic regression model established from the bootstrapped training set was respectively tested in the "out-of-bag" test set.
The AUC were obtained for bootstrapped training and test set in each bootstrap loop.
Step4: Model optimism among bootstrap. Calculate the model optimism as the difference between the AUC of bootstrapped training set and the test set.
Step5: The optimism-corrected performance. Repeating Step 2 to Step 4 for 100 times. The appearing frequency of each feature and the average optimism were calculated and recorded.
Subtract the value from the apparent performance in step 1 and obtain an optimism-corrected performance.

100-fold leave-group-out cross-validation (LGOCV) method
Step 1: Apparent performance. Taking the model's AUC performance of predicting model developed in the original whole training dataset as the apparent performance.
Step2: Multiple splitting of training and test sets. The whole dataset was repeatedly split into training and test sets. A proportion of data in the sub-training sets was set as 60% which was the same as the randomized stratification ratio of 3:2 in the original whole dataset. The rest 40% data were taken as test set in each loop. Such group splitting was repeated 100 times. Step3: LGOCV model establishment. Starting features were features used to establish radiomics model based on the original training dataset. And these features were further selected by backward stepwise logistic regression method with minimum AIC (the same as the modeling method in the manuscript) in each LGOCV training set. The logistic regression model established from the LGOCV training set was respectively tested in the LGOCV test set. The AUC were obtained for LGOCV training and test set in each loop.
Step4: Model optimism among multiple splitting. Calculate the model optimism as the difference between the AUC of LGOCV training set and test set.
Step5: The optimism-corrected performance. Repeating Step 2 to Step 4 for 100 times. The appearing frequency of each feature and the average optimism were calculated and recorded.
Subtract the value from the apparent performance in step 1 and obtain an optimism-corrected performance.